399 research outputs found

    Macro-evolutionary models and coalescent point processes: The shape and probability of reconstructed phylogenies

    Full text link
    Forward-time models of diversification (i.e., speciation and extinction) produce phylogenetic trees that grow "vertically" as time goes by. Pruning the extinct lineages out of such trees leads to natural models for reconstructed trees (i.e., phylogenies of extant species). Alternatively, reconstructed trees can be modelled by coalescent point processes (CPP), where trees grow "horizontally" by the sequential addition of vertical edges. Each new edge starts at some random speciation time and ends at the present time; speciation times are drawn from the same distribution independently. CPP lead to extremely fast computation of tree likelihoods and simulation of reconstructed trees. Their topology always follows the uniform distribution on ranked tree shapes (URT). We characterize which forward-time models lead to URT reconstructed trees and among these, which lead to CPP reconstructed trees. We show that for any "asymmetric" diversification model in which speciation rates only depend on time and extinction rates only depend on time and on a non-heritable trait (e.g., age), the reconstructed tree is CPP, even if extant species are incompletely sampled. If rates additionally depend on the number of species, the reconstructed tree is (only) URT (but not CPP). We characterize the common distribution of speciation times in the CPP description, and discuss incomplete species sampling as well as three special model cases in detail: 1) extinction rate does not depend on a trait; 2) rates do not depend on time; 3) mass extinctions may happen additionally at certain points in the past

    Bayesian phylogenetic estimation of fossil ages

    Full text link
    Recent advances have allowed for both morphological fossil evidence and molecular sequences to be integrated into a single combined inference of divergence dates under the rule of Bayesian probability. In particular the fossilized birth-death tree prior and the Lewis-Mk model of discrete morphological evolution allow for the estimation of both divergence times and phylogenetic relationships between fossil and extant taxa. We exploit this statistical framework to investigate the internal consistency of these models by producing phylogenetic estimates of the age of each fossil in turn, within two rich and well-characterized data sets of fossil and extant species (penguins and canids). We find that the estimation accuracy of fossil ages is generally high with credible intervals seldom excluding the true age and median relative error in the two data sets of 5.7% and 13.2% respectively. The median relative standard error (RSD) was 9.2% and 7.2% respectively, suggesting good precision, although with some outliers. In fact in the two data sets we analyze the phylogenetic estimates of fossil age is on average < 2 My from the midpoint age of the geological strata from which it was excavated. The high level of internal consistency found in our analyses suggests that the Bayesian statistical model employed is an adequate fit for both the geological and morphological data, and provides evidence from real data that the framework used can accurately model the evolution of discrete morphological traits coded from fossil and extant taxa. We anticipate that this approach will have diverse applications beyond divergence time dating, including dating fossils that are temporally unconstrained, testing of the "morphological clock", and for uncovering potential model misspecification and/or data errors when controversial phylogenetic hypotheses are obtained based on combined divergence dating analyses.Comment: 28 pages, 8 figure

    A polynomial time algorithm for calculating the probability of a ranked gene tree given a species tree

    Get PDF
    In this paper, we provide a polynomial time algorithm to calculate the probability of a {\it ranked} gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown

    Simulating Trees with a Fixed Number of Extant Species

    Get PDF
    In this paper, I develop efficient tools to simulate trees with a fixed number of extant species. The tools are provided in my open source R-package TreeSim available on CRAN. The new model presented here is a constant rate birth-death process with mass extinction and/or rate shift events at arbitrarily fixed times 1) before the present or 2) after the origin. The simulation approach for case (2) can also be used to simulate under more general models with fixed events after the origin. I use the developed simulation tools for showing that a mass extinction event cannot be distinguished from a model with constant speciation and extinction rates interrupted by a phase of stasis based on trees consisting of only extant species. However, once we distinguish between mass extinction and period of stasis based on paleontological data, fast simulations of trees with a fixed number of species allow inference of speciation and extinction rates using approximate Bayesian computation and allow for robustness analysis once maximum likelihood parameter estimations are availabl

    Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics

    Full text link
    The reconstruction of phylogenetic trees based on viral genetic sequence data sequentially sampled from an epidemic provides estimates of the past transmission dynamics, by fitting epidemiological models to these trees. To our knowledge, none of the epidemiological models currently used in phylogenetics can account for recovery rates and sampling rates dependent on the time elapsed since transmission. Here we introduce an epidemiological model where infectives leave the epidemic, either by recovery or sampling, after some random time which may follow an arbitrary distribution. We derive an expression for the likelihood of the phylogenetic tree of sampled infectives under our general epidemiological model. The analytic concept developed in this paper will facilitate inference of past epidemiological dynamics and provide an analytical framework for performing very efficient simulations of phylogenetic trees under our model. The main idea of our analytic study is that the non-Markovian epidemiological model giving rise to phylogenetic trees growing vertically as time goes by, can be represented by a Markovian "coalescent point process" growing horizontally by the sequential addition of pairs of coalescence and sampling times. As examples, we discuss two special cases of our general model, namely an application to influenza and an application to HIV. Though phrased in epidemiological terms, our framework can also be used for instance to fit macroevolutionary models to phylogenies of extant and extinct species, accounting for general species lifetime distributions.Comment: 30 pages, 2 figure

    Bayesian inference of sampled ancestor trees for epidemiology and fossil calibration

    Full text link
    Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after the sampling, in particular we extend the birth-death skyline model [Stadler et al, 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that sampled ancestor birth-death models where all samples come from different time points are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply this method to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included among the species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in literature. The sampler is available as an open-source BEAST2 package (https://github.com/gavryushkina/sampled-ancestors).Comment: 34 pages (including Supporting Information), 8 figures, 1 table. Part of the work presented at Epidemics 2013 and The 18th Annual New Zealand Phylogenomics Meeting, 201
    corecore